NSF PAR Search | NSF Public Access Repository

Enhancing value function estimation through first-order state-action dynamics in offline reinforcement learning

Lien, Yun-Hsuan; Hsieh, Ping-Chun; Li, Tzu-Mao; Wang, Yu-Shuen (June 2024, International Conference of Machine Learning)

In offline reinforcement learning (RL), updating the value function with the discrete-time Bellman Equation often encounters challenges due to the limited scope of available data. This limitation stems from the Bellman Equation, which cannot accurately predict the value of unvisited states. To address this issue, we have introduced an innovative solution that bridges the continuousand discrete-time RL methods, capitalizing on their advantages. Our method uses a discrete-time RL algorithm to derive the value function from a dataset while ensuring that the function’s first derivative aligns with the local characteristics of states and actions, as defined by the HamiltonJacobi-Bellman equation in continuous RL. We provide practical algorithms for both deterministic policy gradient methods and stochastic policy gradient methods. Experiments on the D4RL dataset show that incorporating the first-order information significantly improves policy performance for offline RL problems.

Full Text Available

Search for: All records